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Abstract 

In this paper we examine the claims reserving problem using Tweedie's compound 
Poisson model. We develop the maximum likelihood and Bayesian Markov chain 
Monte Carlo simulation approaches to fit the model and then compare the estimated 
models under different scenarios. The key point we demonstrate relates to the com- 
parison of reserving quantities with and without model uncertainty incorporated 
into the prediction. We consider both the model selection problem and the model 
averaging solutions for the predicted reserves. As a part of this process we also con- 
sider the sub problem of variable selection to obtain a parsimonious representation 
of the model being fitted. 
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1 Claims reserving 



Setting appropriate claims reserves to meet future claims payment cash flows is one of the 
main tasks of non-life insurance actuaries. There is a wide range of models, methods and 
algorithms used to set appropriate claims reserves. Among the most popular methods 
there is the chain-ladder method, the Bornhuetter- Ferguson method and the generahzed 
linear model methods. For an overview, see Wiithrich and Merz (2008) and England and 
Verrall (2002). 

Setting claims reserves includes two tasks: estimate the mean of future payments and 
quantify the uncertainty in this prediction for future payments. Typically, quantifying the 
uncertainty includes two terms, namely the so-called process variance and the (parameter) 
estimation error. The process variance reflects that we predict random variables, i.e. it 
describes the pure process uncertainty. The estimation error reflects that the true model 
parameters need to be estimated and hence there is an uncertainty in the reliability of 
these estimates. In this paper, in addition to these two terms, we consider a third source 
of error/uncertainty, namely, wc analyze the fact that we could have chosen the wrong 
model. That is, we select a family of claims reserving models and quantify the uncertainty 
coming from a possibly wrong model choice within this family of models. 
Such an analysis is especially important when answering solvency questions. A poor model 
choice may result in a severe shortfall in the balance sheet of an insurance company, which 
requires under a risk-adjusted solvency regime an adequate risk capital charge. We analyze 
typical sizes of such risk capital charges within the family of Twccdie's compound Poisson 
models, see Tweedie (1984), Smyth and J0rgensen (2002) and Wiithrich (2003). 
Assume that Yij are incremental claims payments with indices i,j e {0, ...,/} , where 
i denotes the accident year and j denotes the development year. At time /, we have 
observations 



Vi = {Yij; z + j<I} 



(1.1) 



and for claims reserving at time / we need to predict the future payments 



Vj^{Yij; i+j>I,i<I}, 



(1.2) 



2 













Vj 



see Table [T] Hence, the outstanding claims payment at time / is given by 

R = Y.^^= E (1-3) 

i=l i+j>I 

Its conditional expectation at time / is given by 

E[R\Vi] = Y,EmVi\= E[Y,,,\Vi\. (1.4) 

i=l i+j>I 

Hereafter, the summation i + j > I is for i < I. Therefore, we need to predict R and to 
estimate E[R\ T>j\. Assume that R is an appropriate ©/-measurable predictor for R and 
©/-measurable estimator for E [R\ P/]. Then, R is used to predict the future payments 
and is the amount that is put aside in the balance sheet of the insurance company for 
these payments. 

Prediction uncertainty is then often studied with the help of the (conditional) mean square 
error of prediction (MSEP) which is defined by 

(1.5) 

If R is ©/-measurable, the conditional MSEP can easily be decoupled as follows, see 
Wiithrich and Merz (2008), section 3.1: 

msep/^l^^ (^) = Var (i?| ©/) + [e [R\ ©/] - R^ (1.6) 
= process variance -|- estimation error. 

It is clear that the consistent estimator R which minimizes the conditional MSEP is given 
hj R = E [R\ Vj] and is used, hereafter, as the "best estimate" for reserves. Assuming 
the model is parameterized by the parameter vector 6, Var(i?|©/) can be decomposed 

as 

Var(i?|©/) = E[Va.T{R\e,Vi)\Vi]+Va.T{E[R\e,Vi]\Vi) (1.7) 
= average process variance + parameter estimation error. 

These are the two terms that are usually studied when quantifying prediction uncertainties 
in a Bayesian context, where the unknown parameters are modelled stochastically. 
That is, we obtain in the Bayesian context a similar decomposition as in the frequentist 



estimation (II ■6p . In the frequentist approach, the second term in fll.6p is often estimated 
by Var(i?), see for example section 6.4.3 in Wiithrich and Merz (2008). 
As discussed in Cairns (2000), in full generality one could consider several sources of 
model uncertainty, however unlike Cairns (2000) we focus on a specific class of models. 
We consider the setting discussed in Bernardo and Smith (1994) termed M Complete 
modelling. In such a setting the premise is that one considers a set of models in which the 
"truth" exists but is unknown a priori. In this setting we demonstrate the risk associated 
with the model uncertainty which we analyze jointly as a decomposition into two main 
parts. The first involves the uncertainty in the parameterization of the model, this is a 
variable selection problem within a nested model structure in the same vein as discussed 
in Cairns (2000). It relates to finding a trade-off between parsimony and accuracy in the 
estimation. The second source of model uncertainty that we study involves the choice of 
a parameter which determines membership from a spectrum of possible models within 
the Tweedie's compound Poisson family of models. We restrict the analysis to Tweedie's 
compound Poisson models and justify this by assuming we are working in the M Complete 
setting. If we relaxed this assumption and therefore consider competing models not in 
this family, then the analysis would be difficult to interpret and analyze in the manner we 
develop in this paper. The second source of model uncertainty will be considered under 
both a model selection and a model averaging setting, given the first "variable selection" 
uncertainty is resolved. As mentioned in Cairns (2000) achieving such an analysis requires 
advanced simulation methodology. Note, in future work we would also consider the M 
Open modeling framework of Bernardo and Smith (1994) which relaxes the belief that 
the truth lies in the set of models considered and hence introduces additional uncertainty 
associated with the family of models considered. The advanced sampling methodology 
required to study the M Open model setting will be briefly discussed. 
The paper is organised as follows. In section 2, we present Tweedie's compound Poisson 
model and section 3 considers parameter estimation in the model, using the maximum 
Likelihood and Bayesian Markov chain Monte Carlo approaches for a real data set. Having 
addressed the variable selection question in section 4, we then analyze claims reserve 
estimation and model uncertainty in both a frequentist and Bayesian setting in section 5. 
We finish with conclusions from our findings. 



4 



2 Tweedie's compound Poisson model 



We assume that Yij belongs to the family of Tweedie's compound Poisson models. Below 
we provide three different parameterizations for Tweedie's compound Poisson models, for 
rigorous derivations we refer to J0rgensen and de Souza (1994), Smyth and J0rgensen 
(2002) and Wiithrich (2003). 

Model Assumptions 2.1 (1st Representation) We assume thatYij are independent 
for i,j G {0, . . . , /} and have a compound Poisson distribution 



2nd Representation. The random variable Yij given in fl2.ip belongs to the family of 
Tweedie's compound Poisson models, see Tweedie (1984). The distribution of Yij can be 
reparameterized in such a way that it takes a form of the exponential dispersion family, 
see e.g. formula (3.5) and Appendix A in Wiithrich (2003): 
Yij has a probability weight at given by 




(2.1) 



in which (a) Nij and X\ '^ are independent for all k, (h) Nij is Poisson distributed with 
parameter Xij; (c) xf'^ are independent gamma severities with the mean Tij > and the 
shape parameter 'J > 0. Hereafter, we denote l^} as an indicator function. 



P [Yi,j = 0] = P [Ai,,- = 0] = exp {-0-//^p(^^i,,)} 



(2.2) 



and for y > the random variable Yi^j has continuous density 




(2.3) 



Here 9ij < 0, (pi^ > 0, the normalizing constant is given by 




(2.4) 



and the cummulant generating function Kp{.) is given by 



def. 



1 



[il-p)9] 



7 



(2.5) 



2-p 
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where p e (1, 2) and 7 = (2 — p) / (1 — p) . 

The parameters, in terms of the 1st representation quantities, are: 



P = P(7) = ^ e (1,2), 

7 + i 




(2.6) 



(t). . ^ -hi bl— > 



(2.7) 



(i-p) 



<0, 



(2.8) 



(2.9) 




(2.10) 



Var(yij) = (l)ijKp(eij) = /xf^^.. 



(2.11) 



That is, Yij has the mean /Xjj, dispersion 0j j and variance function with the variance 
parameter p. The extreme cases p — > 1 and p — > 2 correspond to the overdispersed Poisson 
and the gamma models, respectively. Hence, in this spirit, Tweedie's compound Poisson 
model with p e (1,2) closes the gap between the Poisson and the gamma models. Often 
in practice, p is assumed to be known and fixed by the modeller. The aim of this paper is 
to study Model Uncertainty, that is, we would like to study the sensitivity of the claims 
reserves within this subfamily, i.e. Tweedie's compound Poisson models (which are now 
parameterized through p). This answers model uncertainty questions within the family of 
Tweedie's compound Poisson models. In this paper the restriction on p G (1, 2) is taken in 
the context of practical application of these models to claims reserving, Wiithrich (2003) 
comments that the majority of claims reserving problems will be captured under this 
assumption. However, in general, in the exponential dispersion family p can be outside 
of the (1,2) range, e.g. p = produces a Gaussian density and p — 3 leads to an inverse 
Gaussian model. 

3rd Representation. Utilizing the above definitions, the distribution of Yij can be 
rewritten in terms of //jj, p and (pij as 




(2.12) 
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and for y > 



1—p '2—p 

y — 

I — p 2—p 



(2.13) 



3 Parameter estimation 

Our goal is to estimate the parameters /lij, p and (f)ij based on the observations Vj. In 
order to estimate these parameters we need to introduce additional structure in the form 
of a multiplicative model. 

Model Assumptions 3.1 Assume that there exist exposures a — {ag, . . . ,ai) and a 
development pattern /3 = (/3o, ■ ■ ■ , Pi) such that we have for all i, j & {0, . . . , 1} 

= tti Pj- (3.1) 
Moreover, assume that (pij — (f) and ai > 0, (5j > 0. 

In addition, we impose the normalizing condition cto = 1, so that the estimation problem 
is well-defined. That is we have (27 + 3) unknown parameters p, 0, a, f3 that have to be 
estimated from the data Vj. Next we present the hkehhood function for this model and 

then develop the methodology for parameter estimation using the maximum likelihood 
and Bayesian inference methods. 

3.1 Likelihood function 

Define the parameter vector 6 = (p, 0, a,/3). Then the likelihood function for Y^j, i + j< 
I, is given by 

LvAO) = n 0,P) exp k-^^ - ^^^1 } > (3-2) 

i+j<i ^ ^ P J ) 

where we set c(0; (f),p) — 1 for Yij — 0. The difficulty in the evaluation of the hkelihood 
function is the calculation of c{y; (f),p) which contains an infinite sum 

<r,^^^)-EXjM;^)'Tr^y^E^ (3.3) 
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where 7 = 7 (p) = (2 — p) / (1 — p). Tweedie (1984) identified this summation as Wright's 
(1935) generahzed Bessel function, which can not be expressed in terms of more common 
Bessel functions. To evaluate this summation we follow the approach of Dunn and Smyth 
(2005) which directly sums the infinite series, including only terms which significantly 
contribute to the summation. Consider the term 

log Wr = r log z — log r (1 + r) — log F {jr) , 

where 

^_(i/0r^v_ 

' (p-l)^(2-p)- 

Replacing the gamma functions using Stirling's approximation and approximating ^yr by 
7r + 1 we get 

log Wr^r {log 2; + (1 + 7) - 7 log 7 - (1 + 7) log r} - log (27r) - ^ log 7 - log r, 

which is also a reasonable approximation for small r. Treating r as continuous and taking 
the partial derivative w.r.t. r gives 

d log Wr 



r 



log z — log r — 7 log (7r) . 



dr 

Hence, the sequence Wr is unimodal in r. Solving dWr/dr = 0, to find (approximately) 
the maximum of Wr, results in the approximate maximum lying close to 

i?o = i?o(0,p) = -^2~^- ^^-^^ 

This gives a surprisingly accurate approximation to the true maximum of Wr, r e N. 
Finally, the aim is to find Rl < Rq < Ru such that the following approximation is 
sufficiently accurate for the use in the evaluation of the likelihood terms, 

^ Ru 

c{y; (t),p) ^ c{y; (t),p) ^ - V Wr- (3.5) 

^ r=RL 

The fact that d log Wr/dr is monotonic and decreasing implies that logFF^ is strictly 
convex in r and hence the terms in Wr decay at a faster rate than geometric on either 
side of Rq. Dunn and Smyth (2005) derive the following bounds, 

1 - 0-^^"^ 1 

civ; <t>,p) - civ; <t>,p) < Wr,^,—^ + Wr^+i- (3.6) 

i- — Ql i- — Qu 
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with 



qL = exp 



dlogWr 
dr 



, qu = exp 

r=RL-l 



d log Wr 

dr 



(3.7) 

r=Ru+l 



These bounds are typically too conservative since the decay is much faster than geometric. 
In practice, an adaptive approach balancing accuracy and efficiency is to continue adding 
terms either side of the maximum until the lower and upper terms satisfy the double 
precision constraints Wr^ ^ c^^^Wrq (or = 1) and Wr^ ^ e~^'^WRQ. When evaluating 
the summation for c{y; it was important to utilize the following identity to perform 
the summation in the log scale to avoid numerical overflow problems, 

/ Ru 

logc(y; = - logy + log Wr^ + log I exp (log {Wr) - log {WrJ) 

\r=RL 

We made an additional observation when analyzing this model. For our data set, as p 
approaches 1 (i.e. when the distribution approaches the overdispersed Poisson model) the 
likelihood may become multimodal. Therefore, to avoid numerical complications in actual 
calculations, we restrict to p ^ 1.1. At the other extreme, when p = 2 the number of terms 
required to evaluate c{y; 4>,p) may become very large, hence to manage the computation 
burden, we restrict p ^ 1.95. These limitations are also discussed in Dunn and Smyth 
(2005). For our data set, we checked that this restriction did not have a material impact 
on the results. 



3.2 Maximum likelihood estimation 

The maximum likelihood estimator (MLE) for the parameters is given by maximizing 

L-Dj{0) m = {p,(f),a, (3) under the constraints a, > 0, /9j > 0, > and p G (1,2). 
This leads to the MLEs 0^^^ = (p^^E^ ^le^ ^^mle^ ^mle^ ^^^^ estimate 

reserves for R, given Dj, 

^MLE ^ J2 ^L^. (3.8) 

i+j>I 

A convenient practical approach to obtain the MLEs is to use the fact that at the maxi- 
mum of the likehhood, (3 are expressed through a and p according to the following set of 
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equations, p G (1,2): 

I-k 



=0 

I-k 



(3k='-=hi ' fc = 0,...,/, (3.9) 



obtained by setting partial derivatives 



2-p 



= - a^-'(3l-^) (3.10) 

equal to zero. Hence, after maximizing the likelihood in cx.,p,(j) one then calculates the 
set of equations (13. 9p for the remaining parameters utilizing the normalization condition 
ao = 1. 

Under an asymptotic Gaussian approximation, the distribution of the MLEs is Gaussian 
with the covariance matrix elements 

/ 2MLE 2MLE 



cov(e^^fL^j^(/-^)^^^., (3.11) 

where / is Fisher's information matrix that can be estimated by the observed information 
matrix 



(3.12) 



It is interesting to note that, 3/^'"^ = Yq j. Also, it is easy to show (using fl3.10p and 
(13. lip ) that J^f^^^ is orthogonal to all other parameters, i.e. 

C0V(®^LE^ ^LE) ^ ^LE ^MLE_ ^3) 

The next step is to estimate the parameter estimation error in the reserve as a function 
of the parameter uncertainty. We do this via propagation of error by forming a Taylor 
expansion around the MLEs, see England and Verrall (2002) formulae (7.6)-(7.8) and 
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Wiithrich (2003) formulae (5.1)-(5.2), 



stdev (^^^L^) = ^Var (^^le^ (3. 14) 

y^/^MLEA ^ ^ gMLEgMLE^^^/^LE^^LEA (g^g) 

^ ' il+jl>Ii2+j2>I ^ ^ ^ ' 

+ >, p. /?. cov a. , a. 

+ 2 V V S^^LE^LE^OvfS^LE^LEy 

n+ji>-f«2+j2>-f ^ ^ 

Additionally, using the independence assumption on and (2.11), the process variance 
is estimated as 

Var (/?) = 5^ (af ^L^) 0^^^. (3.16) 

i+j>i 

Then the conditional MSEP (11.61) is estimated by 

^^R\vj (r^^"") = Va^ (i?) + Va^ (r^^^'^ (3.17) 
= MLE process variance + MLE estimation error. 

Note that, in practice, typically MLE is done for a fixed p (expert choice) and hence 
model selection questions are neglected. In our context it means that the expert chooses 
p and then estimates ck^^^, /^^le ^mle (^^^^ ^^^^ Wiithrich (2003), section 4.1). 
The case p = 1 corresponds to the overdispersed Poisson model and provides the chain- 
ladder estimate for the claims reserves (see Wiithrich and Merz (2008), section 2.4). It 
is important to note that, often the dispersion parameter (j) is estimated using Pearson's 
residuals as 

1 (V ;^MLE3MLE\2 

where N is the number of observations Yi^j in Vj and k is the number of estimated 
parameters a^, (3j (see e.g. Wiithrich and Merz (2008), formula (6.58)). Also note that 
for a given p, _R^^le gjygj^ \yy p gp does not depend on (p and the estimators for the process 
variance (13.161) and estimation error (13.151) are proportional to cj). Next we present the 
Bayesian model which provides the posterior distribution of the parameters given the 
data. This will be used to analyze the model uncertainty within Tweedie's compound 
Poisson models. 



11 



3.3 Bayesian inference 

In a Bayesian context all parameters, p, 0, ctj > and f3j > 0, are treated as random. 
Using Bayesian inference we adjust our a priori beliefs about the parameters of the model 
utilizing the information from the observations. Through the Bayesian paradigm we are 
able to learn more about the distribution of p, 0, ex. and /3 after having observed Vj. 
Our a priori behefs about the parameters of the model are encoded in the form of a prior 
distribution on the parameters 7r(0). Then the joint density of T>i = {Yij > 0;i + j < /} 
and — {p, 4>, a., j3) is given by 

Li,,(6>) 7r(6>). (3.19) 

Now applying Bayes' law, the posterior distribution of the model parameters, given the 
data T>i, is 

7r(6> I Vi) (X L-v,{e) 7r(0). (3.20) 

Usually, there are two problems that arise in this context, the normalizing constant of this 
posterior is not known in closed form. Additionally, generating samples from this posterior 
is typically not possible using simple inversion or rejection sampling approaches. In such 
cases it is usual to adopt techniques such as Markov chain Monte Carlo (MCMC) methods, 
see for example Gilks et al. (1996) and Robert and Casella (2004) for detailed expositions 
of such approaches. 

The Bayesian estimators typically considered are the Maximum a Postiori (MAP) esti- 
mator and the Minimum Mean Square Estimator (MMSE), that is the mode and mean 
of the posterior, defined as follows: 

MAP : e^^^ = argmax[7r(6> | Vj)] , (3.21) 



MMSE : e^MSE ^E[d\Vi]. (3.22) 

We mention here that if the prior 7r{0) is constant and the parameter range includes the 
MLE, then the MAP of the posterior is the same as the MLE. Additionally, one can 
approximate the posterior using a second order Taylor series expansion around the MAP 
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estimate as 



ln7r(6> I VA ^ Invrf^ 



MAP 



1 V- 

+ 9E 



2 ^ d9,d9. 



Irnrid \ Vj] 



0=0MAP 



- {e^ - ef^p) . (3.23) 



This corresponds to 7r(0 | Vj) approximated by the Gaussian distribution with the mean 
and covariance matrix calculated as the inverse of the matrix 

92 



ln7r(6> I Vi] 



(3.24) 



0=0MAP 

which in the case of diffuse priors (or constant priors defined on a large range) compares 
with the Gaussian approximation for the MLEs fl3.11l) - p.l2p . 

In the Bayesian context, the conditionally expected future payment, for Model Assump- 
tions 3.1, is given by 

E[R\Vi]= J2 E[a,f3,\Vj]. (3.25) 
i+j>i 

Denote the expected reserves, given the parameters 6, by 



R = E[R\d] = J2 

i+j>i 

Then, the best consistent estimate of reserves (ER) is given by 



(3.26) 



R 



J2 E[a,P,\Vj] = E[R\Vi] 
i+j>i 



(3.27) 



which is, of course, a ©/-measurable predictor. Hence, the conditional MSEP is simply 



msep^lP^ [R ] = E 



R-R' 



VaT(R\Vi) . 



(3.28) 



This term, in the Bayesian approach for Tweedie's compound Poisson model, is decom- 
posed as, see also (11. 7p . 

Var (i?| Vj) = Var 1 ^ Yij vA = ^ E [{aiPjf ^\ Vi] + Var (^^ vi^ . (3.29) 

\i+j>I J i+j>I 

Hence, we obtain the familiar decoupling into average process variance and estimation 
error. However, in addition we incorporate model uncertainty within Tweedie's compound 
Poisson model, which enters the calculation by the averaging over all possible values of 
the variance parameter p. 
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3.4 Random walk Metropolis Hastings-algorithm within Gibbs 

In this section we describe an MCMC method to be used to sample from the posterior 
distribution fl3.20p . The following notations are used: = {p,(f),ct, /B) is the vector of 
parameters; U {a,b) is the uniform distribution on the interval (a, 6); fN{x]fi,a) and 
F/v (x; /i, 0") are the Gaussian density and distribution correspondingly with the mean 
/X G M and standard deviation cr > at position a; G M. 

Prior Structure: We assume that all parameters are independent under the prior distri- 
bution 7r(0) and all distributed uniformly with 9i ~ U {ai,bi). The prior domains we used 
for our analysis were p G (1.1, 1.95), (p G (0.01, 100), ai G (0.01, 100) and f3j G (0.01, 10^). 
These are reasonable ranges for the priors in view of our data in Table [2] and correspond- 
ing to the MLEs in Table [31 Other priors such as diffuse priors can be applied with no 
additional difficulty. The choice of very wide prior supports was made with the aim of 
performing inference in the setting where the posterior is largely implied by the data. 
Subsequently, we checked that making the ranges wider does not affect the results. 

Next we outline a random walk Metropolis-Hastings (RW-MH) within Gibbs algorithm. 
This creates a reversible Markov chain with the stationary distribution corresponding 
to our target posterior distribution fl3.20p . That is, we will run the chain until it has 
sufficiently converged to the stationary distribution (=posterior distribution) and in doing 
so we obtain samples from that posterior distribution. It should be noted that the Gibbs 
sampler creates a Markov chain in which each iteration of the chain involves scanning 
either deterministically or randomly over the variables that comprise the target stationary 
distribution of the chain. This process involves sampling each proposed parameter update 
from the corresponding full conditional posterior distribution. The algorithm we present 
generates a Markov chain that will explore the parameter space of the model in accordance 
with the posterior mass in that region of the parameter space. The state of the chain at 
iteration t will be denoted by 0^ and the chain will be run for a length of T iterations. 
The manner in which MCMC samplers proceed is by proposing to move the ith parameter 
from state 6'*"^ to a new proposed state 9*. The latter will be sampled from an MCMC 
proposal transition kernel fl3.30p . Then the proposed move is accepted according to a 
rejection rule which is derived from a reversibility condition. This makes the acceptance 
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probability a function of the transition kernel and the posterior distribution as shown 
in (13.311) . If under the rejection rule one accepts the move then the new state of the 
ith parameter at iteration t is given by Oj = 6*, otherwise the parameter remains in the 
current state = 6\^^ and an attempt to move that parameter is repeated at the next 
iteration. In following this procedure, one builds a set of correlated samples from the 
target posterior distribution which have several asymptotic properties. One of the most 
useful of these properties is the convergence of ergodic averages constructed using the 
Markov chain samples to the averages obtained under the posterior distribution. 
Next we present the algorithm and then some references that will guide further investi- 
gation into this class of simulation methodology. Properties of this algorithm, including 
convergence results can be found in the following references Casella and George (1992), 
Robert and Casella (2004), Gelman et al. (1995), Gilks et al. (1996) and Smith and 
Roberts (1993). 



Random Walk Metropolis Hastings (RW-MH) within Gibbs algorithm. 

1. Initialize randomly or deterministically for t = the parameter vector 6^ (e.g. MLEs). 

2. For t = 1,...,T 

a) Set 61* = 6»*-i 

b) For i = 1, . . . , 2/ + 3 

Sample proposal 0* from Gaussian distribution whose density is truncated below and 
above hi and given by 

In [fi . t^i. (^RWi) - p (h.nt ^ ^ F (n -fft ^ N l-^-^^^ 

to obtain e* = {e\, eu, e*, e'-^l, ..). 

Accept proposal with acceptance probability 

„.^ . U I 'Dl)fJ,(l)i»l''RW,) \ ,,,,, 



where 7r(0* | Vj) is given by fl3:20|) . That is, simulate U ~ t/(0, 1) and set e\ = 9* if 

u < a{e\e*). 

=^ Note that in fl3.3ip the normalizing constant of the posterior 7t{0 \ Vj) from (13.201) 
is not needed. 
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Reiricirk. The RW-MH algorithm is simple in nature and easily implemented. However, 
if one does not choose the proposal distribution carefully, then the algorithm only gives 
a very slow convergence to the stationary distribution. There have been several studies 
regarding the optimal scaling of proposal distributions to ensure optimal convergence 
rates. Gelman et al. (1997), Bedard and Rosenthal (2007) and Roberts and Rosenthal 
(2001) were the first authors to publish theoretical results for the optimal scaling problem 
in RW-MH algorithms with Gaussian proposals. For (/-dimensional target distributions 
with i.i.d. components, the asymptotic acceptance rate optimizing the efficiency of the 
process is 0.234 independent of the target density. In this case we recommend that 
the selection of anwi are chosen to ensure that the acceptance probability is roughly 
close to 0.234. This number is the acceptance probability obtained for asymptotically 
optimal acceptance rates for RW-MH algorithms when applied to multidimensional target 
distributions with scaling terms possibly depending on the dimension. To obtain this 
acceptance rate, one is required to perform some tuning of the proposal variance prior to 
final simulations. An alternative approach is to utihze a new class of Adaptive MCMC 
algorithms recently proposed in the literature, see Atchade and Rosenthal (2005) and 
Rosenthal (2007), but these are beyond the scope of this paper. 

3.5 Markov chain results and analysis 

This section presents the results comparing both MLE and Bayesian estimates for the 
parameters of Twccdic's compound Poisson model. It is also demonstrated how additional 
information in a Bayesian framework can be obtained through the complete knowledge 
of the target posterior distribution obtained from the MCMC algorithm described above. 
In this regard we demonstrate how this additional information can be exploited in the 
claims reserving setting to provide alternative statistical analysis not obtainable if one 
just considers point estimators. We also analyze model averaging solutions in section 
5. These can be obtained by forming estimates using the information given by the full 
posterior distribution tt {6 \ T>j) that we find empirically from the MCMC samples. 
The maximum likelihood and MCMC algorithms were implemented in Fortran. The 
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maximization routine for the MLEs utilizes the direct search algorithm DBCPOL (that 
requires function evaluation only) from the IMSL numerical library. Note that, gradient 
based optimization routines such as the BFGS algorithm can be more efficient, but the 
direct search algorithm we used was sufficient for our problem in terms of computing time 

4 seconds on a typical desktop PC^). 
The algorithm was analyzed on synthetic data and found to provide correct estimates. 
In particular with uniform priors the MAP estimates of the parameters are the same as 
the MLEs, up to numerical errors. This was confirmed for different sized claims triangles. 
The actual data set studied in this paper is presented in Table [2l The data we study is 
the standard data set used in Wiithrich and Merz (2008) scaled by 10,000. 
The results presented for the Bayesian approach were obtained after pretuning the Markov 
chain random walk standard deviations, CRWi ? to produce average acceptance probabilities 
of 0.234. Then the final simulation was for 10^ iterations from a Markov chain (^ 17min^) 
in which the first lO'' iterations were discarded as burnin when forming the estimates. 
The pretuned proposal standard deviations <7RWi are presented in Table [31 The first set of 
results in Table [3] demonstrates the MLE versus the Bayesian posterior estimator MMSE 
for all model parameters. Included are the [5%, 95%] predictive intervals for the Bayesian 
posterior distribution. The MLE standard deviations are calculated using 03. lip . The 
numerical standard errors (due to a finite number of MCMC iterations) in the Bayesian 
estimates are obtained by blocking the MCMC samples post burnin into blocks of length 
5000 and using the estimates on each block to form the standard error (these are given 
in brackets next to the estimates). 

The next set of analysis demonstrates the performance of the MCMC approach in con- 
verging to the stationary distribution given by the target posterior 7r(0 | P/). To analyze 
this, in Figure [T], we present the trace plots for the Markov chain for the parameters, 
{p,(j),ai, Po). Also, in Figure [21 we demonstrate the marginal posterior distribution his- 
tograms and pair-wise posterior scatter plots for (p, 0, ai, Po, c^i, Pi)- The lower panels in 
Figure [2l are the scatter plots for the pair-wise marginal posteriors, the diagonal contains 
the marginal posteriors and the upper panels contains the correlations between parame- 
ters. These plots demonstrate strong linear correlations between several parameters. Some 
lintel® Core^^^2 Duo, 2.13GHz processor. 
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of these correlations are similar to MLE correlations calculated using fl3.11|] . For example, 
we found that under the posterior distribution p{p, (p) ^ —0.82 and p(/5o, ai) ~ —0.63, see 
Figure [21 are similar to p(p^^^, 0^^^) ^ -0.94 and p{P^^^,af^^) ^ -0.68 correspond- 
ingly. However, we also observed that under the posterior distribution p{p,f3i) ~ —0.17 
and p{(j),Pi) ~ 0.23, see Figure [2], while corresponding MLE correlations are zero, see 

(EH. 



4 Variable selection via posterior model probabilities 

In the development so far it has been assumed that variable selection is not being per- 
formed, that is we are assuming that the model is known and we require parameter 
estimates for this model. This is equivalent to specifying that the number of a and f3 
parameters is fixed and known in advance. We now relax this assumption and will demon- 
strate how the variable selection problem can be incorporated into our framework. The 
procedure we utilize for the variable selection is based on recent work of Congdon (2006) 
and specifies the joint support of the posterior distribution for the models and parameters 
under the product space formulation of Carlin and Chib (1995). 

In this section we consider the subset of nested models which create homogenous blocks 
in the claims reserving triangle (/ = 9) for the data set in Table 2. 

• Mq : 0[o] = (^p, 0, 5o = ao, . . . ,ai = «/, Pq = Pq, . . . , Pi = Pi^ - saturated model. 

• Ml : 0[i] = (^p, 0, Po^ with ^/5o = /3o = ■■■ = ("o = ••• = "/ = 1) • 

• M2 : 0[2] = (^p, 0, 5i, Po, Pi^ with (ao = • • • = "4 = 1), (5i = as = • • • = «/), 
(Po = Po = . . . = Pa), (Pi = = . . . = Pi). 

• M3 : 0[3] = (^p, 0, 5i, 52, Po, Pi,P2) with (ao = tti = 1), (5i = 02 = . . . = 05), 

(52 = ae = • • • = «/), (po = Po = Pi), (pi = P2 = ■ ■ ■ = P5), {^2 = p6 = ■ ■ ■ = Pi) ■ 

• M4 : 0[4] = (^p, 0, 5i, 52, 53, Po, pi,P2, ^3) with (ao = Oi = 1), (5i = 02 = "3), 

(52 = 04 = as = ae), (53 = a? = as = a/), ^^0 = Po = {^i = P2 = P3), 
{P2 =Pa = P5 = Pe), {P3 = P7 = P8 = Pi) . 
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■■ ^[5] = (p, 0, Oil, "2, as, a4, Po, Pi, P2, Ps, PaJ with (oq = ai = 1), (ai = 02 = "3 
= 04 = as), ("3 = "6 = «?), = "8 = «/), (^0 = /?o = Pi), (Pi = P2 = P3 

P2 = Pa = /^s), (^3 = P6 = P7), {Pa = Ps = P 



• Me : 0[6] = (p, 0, ao, "i, /5o, A, • • • , with (en = ai = . . . = a/) . 

Now, to determine the optimal model, we first consider the joint posterior distribution for 
the model probability and the model parameters denoted 7i{Mk,6[k] \ l^i), where 6[k] = 
(^^i,[k],02,[k], ■ ■ ■ ,ON^^,[k]j is the parameter vector for model [k]. Additionally we denote 
the prior bounds for 6'j [fc] as . We assume a prior distribution vr (Mfc) for the 

model selection and a prior for the parameters conditional on the model vr (^0[k] \ Mk) . 
It is no longer possible to run the standard MCMC procedure we described in section 
3.4 for this variable selection setting. This is because the posterior is now defined on 
either a support consisting of disjoint unions of subspaces or a product space of all such 
subspaces, one for each model considered. A popular approach to run Markov chains 
in such a situation is to develop a more advanced sampler than that presented above, 
typically in the disjoint union setting. This involves developing a Reversible Jump RJ- 
MCMC framework, see Green (1995) and the references therein. This type of Markov 
chain sampler is complicated to develop and analyze. Hence, we propose as an alternative 
in this paper to utilize a recent procedure that will allow us to use the above MCMC 
sampler we have already developed for a model Mk- The process we must follow involves 
first running the sampler in the simulation technique described in section 3.4 for each 
model considered. Then the calculation of the posterior model probabilities n{Mk \ T>i) 
is performed using the samples from the Markov chain in each model to estimate (14.30 . 
Furthermore, our approach here removes the assumption on the priors across models, 
made by Congdon (2006), p. 348, 

TT{di,n]\ Mk) = l,m^k (4.1) 

and instead we work with the prior 



N, 



[m] 



-1 



7r(0[„] I M,) = n h- , , - , , ^^^k. (4.2) 

1 = 1 

That is, instead we use a class of priors where specification of priors for a model Mk 
automatically specifies priors for any other model. This is a sensible set of priors to 
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consider given our product space formulation and it has a clear interpretation in our 
setting where we specify our models through a series of constraints, relative to each other. 
In doing this we also achieve our goal of having posterior model selection insensitive to 
the choice of the prior and being data driven. The modified version of Congdon's (2006), 
formula A. 3, we obtain after relaxing Congdon's assumption, allows the calculation of 
the posterior model probabilities 7r(Mjt | D/) using the samples from the Markov chain in 



each model to estimate 



7r(Mfe \Vi)^ J 7r(Mfe,6>[fe] | Vi)deik] = J 7r(Mfe | 6>[fc], P,)7r(6>[fc] | Vi)c 



T-T,, 

K 

T Lv,{Mk,ej^ik])iiA^m\ ^kH^k) 

T — Th ^ -- ^ 



'^=^^+^ELo^2'.(^-,%H)n^(0.-,[fe] I M„)7r(Mj 

fe=0 



3=Tb+l 



1 ^ LDj{Mk,6j^[k] 



Here K — 6, and for a proof, see Congdon (2006), formula A. 3. Note that, the prior of 
parameters (given model) contributes in the above implicitly as 0j,[k] are MCMC samples 
from the A;*^ models posterior distribution. In the actual implementation we used T = 
100, 000 and the burnin period — 10, 000. Note, the prior probabilities for each model 
are considered diffuse and are set such that all models a priori are equiprobable, hence 
7r(Mfc) — 1/ (K + 1) and n{dj,[k] \ Mk) is the prior for model M^'s parameters evaluated 
at the j^^ Markov chain iteration. Once we have the posterior model probabilities we 
can then take the MAP estimate for the optimal model (variable selection) for the given 
data set. In this paper we do not consider the notion of model averaging over different 
parameterized models in the variable selection context. Instead we simply utilize these 
results for optimal variable selection from a MAP perspective for the marginal posterior 
7r(Mfc I Vj). 

In addition to this model selection criterion we also consider in the Bayesian framework the 
Deviance Information Criterion (DIC), see Bernardo and Smith (1994). From a classical 
maximum likelihood perspective we present the likelihood ratio (LHR) p-values. 
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Application of this technique to the simulated MCMC samples for each of the considered 
models produced the posterior model probabilities given in Table HI This suggests that 
within this subset of models considered, the saturated model Mq was the optimal model 
to utilize in the analysis of the claims reserving problem, vr (Mq | Vj) ^ 0.7. It is followed 
by model Mg with vr (Mq | Vj) ^ 0.3. Additionally, the choice of Mq was also supported 
by the other criteria we considered: DIG and LHR. 

In future research it would be interesting to extend to the full model space which considers 
all models in the power set 1 0[o] | . This is a large set of models including all combinatorial 
combinations of model parameters for a's and In such cases it is no longer feasible 
to run standard MCMC algorithms in each model since this will involve an impractical 
number of simulations. Hence, more sophisticated model exploration techniques will be 
required such as RJ-MCMC, see Green (1995) or the product space samplers of Carlin 
and Chib (1995). 

We note here that we do not claim Mq is the optimal model in all possible models, only 
in the subset we consider in this section. In saying this we acknowledge that we aim to 
work in the saturated model but consider it important to illustrate how variable selection 
can be performed in this class of models and also raise awareness that this will impact 
the model uncertainty analysis subsequently performed. 

Hence, using these findings and the analysis of the MCMC results for model Mq provided 
above, we may now proceed to analyze the claims reserving problem. Of interest to the 
aim of this paper is the sensitivity of the model choice parameter p to the parameterization 
of the claims reserving triangle. This is particularly evident when one considers the MMSE 
estimate of the model specification parameter p estimated under each model. In the most 
parsimonious, yet inflexible model Mi the estimate obtained was MMSE (p) ^ 1.9, a 
very similar estimate was obtained in models M2, M3, M4 and M5, however, interestingly 
in the saturated model the estimate was MMSE {p) ^ 1.3 which is almost at the other 
extreme of the considered range for which the parameter p is defined. 
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5 Calculation of the claims reserves 



We now demonstrate the results for several quantities in the claims reserving setting, uti- 
lizing the MCMC simulation results we obtained for the Bayesian posterior distribution 
under the variable selection model Mq (saturated model). In particular, we start by not- 
ing that we use uniform prior distributions with a very wide ranges to perform inference 
implied by the data only. In this case, theoretically, the Bayesian MAP (the posterior 
mode) and MLEs for the parameters should be identical up to numerical error due to the 
finite number of MCMC iterations. A large number of MCMC iterations was performed 
so that the numerical error is not material. In general, the use of more informative priors 
will lead to the differences between the MAP and MLE. Some of the MMSE estimates (the 
posterior mean) were close to the MAP estimates, indicating that the marginal posterior 
distributions are close to symmetric. When the posterior is not symmetric, MMSE and 
MAP can be very different. Also, note that the uncertainties in the parameter MLEs are 
estimated using the asymptotic Gaussian approximation fl3.lip - fl3.12p . In the case of con- 
stant priors, this should lead to the same inferences as corresponding Bayesian estimators 
if the posterior distributions are close to the Gaussian approximation, see fl3.23p - fl3.24p . 
In addition, the MLEs for the reserves, estimation error and process variance, see section 
3.2, are based on a Taylor expansion around parameter MLEs assuming small errors. 
In many cases the posterior is materially different from the Gaussian distribution, has 
significant skewness and large standard deviation leading to the differences between the 
MLEs and corresponding Bayesian estimators. Having mentioned this, we now focus on 
the main point of this paper which involves analysis of the quantities in Table [5] related to 
the model uncertainty within Tweedie's compound Poisson models (introduced by fixing 
model parameter p) in a Bayesian setting. 

It is worth noting that point estimates of model parameters are either in the frequentists 
approach MLEs or in a Bayesian approach the MAP or MMSE estimates. These are under 
the auspice that we wish to perform model selection (i.e. selection of p). The focus of this 
paper is to demonstrate the difference in results obtained for reserve estimates that can 
arise by performing model averaging instead of the typical approach of model selection, 
using a priori chosen p. In this regard we perform estimation utilizing the full posterior 



22 



distribution of the parameters and not just point estimators. This allows us to capture the 
influence of the model uncertainty (uncertainty in p), since in a Bayesian setting we can 
account for this uncertainty using the posterior distribution. In particular, the Bayesian 
analysis specifies the optimal p (either in the MAP or the MMSE context) and it also 
provides a confidence interval for the choice of p (see Figure [7]), which corresponds to the 
choice of the optimal model within Tweedie's compound Poisson models. Moreover, we 
demonstrate the impact on the claims reserve by varying p from 1.1 to 1.9 (i.e. for a fixed 
model choice). 

5.1 Results: average over p 

Initially it is worth considering the predicted reserve distribution for the estimator R. 

This is obtained by taking the samples t = 10, 001 to 100, 000 from the MCMC simulation 

{p*, 0*, a*, /3*} and calculating via fl3.26p . The histogram estimate is presented in 

Figure [31 In the same manner, we also estimate the distributions of Rij = aijSj for the 

individual cells of the I x I claims matrix, presented as subplots in Figure HI Note that 

the total observed loss in the upper triangle (~ 9274) is consistent with E[ ^ caPj] 

i+j<i 

and [Var( J2 estimated using the MCMC samples as (^ 9311) and 190) 

i+j<i 

respectively. The maximum likelihood approach results in ^ af^^^ (3^^^ k, 9275 with 

i+j<i 

standard deviation 124 also conforming with the observed total loss. 
Now we focus on quantities associated with the estimated distribution for R to calculate 
the results, see Tabled which can only be estimated once the entire posterior distribution 
is considered. These quantities are the key focus of this paper since they allow assess- 
ment of the conditional MSEP as specified in fl3.28p . In particular, we may now easily 
use the posterior probability samples obtained from the MCMC algorithm to evaluate 
the estimated reserve (ER), the process variance (PV) and the estimation error (EE) in 
the conditional MSEP. This provides an understanding and analysis of the behaviour of 
the proposed model in both the model averaging and model selection (i.e. selection of p) 
contexts whilst considering the issue of model uncertainty, the goal of this paper. The 
Bayesian estimates for ER, P V, EE and MSEP are presented in Table |6l The correspond- 
ing MLEs were calculated using (13.81) . fl3.16p . (13.150 and (I3.17P respectively and presented 
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in Table for comparison. The results demonstrate the following: 

• Claims reserves MLE, R^^^, is less than Bayesian estimate by approximately 
3%, which is the estimation bias of the claims reserve MLE (see also Wiithrich and 
Merz (2008), Remarks 6.15. 

are of the same magnitude, approximately 6-7% of the total claims 

reserves. 

• MLEs for V EE and V PV are less than corresponding Bayesian estimates by ap- 
proximately 37% and 30%, respectively. 

• The difference between R^^^ and R^ is of the same order of magnitude as V EE 
and \/PV and thus is significant. 

Note that we use constant priors with very wide ranges, the MLE uncertainties are calcu- 
lated using an asymptotic Gaussian approximation and numerical error due to the finite 
number of MCMC iterations is not material (also see the 1st paragraph, section 5). The 
observed significant differences between the MLEs and corresponding Bayesian estima- 
tors suggest that our posterior distributions are skewed and materially different from the 
Gaussian distribution. 

We conclude this section with the distribution of R, the total outstanding claims payment, 
see Figure [51 This is obtained from the MCMC samples of the parameters {p, 0, a, (3) 
which we then transform to parameters (A, 7,t) from model representation 1, section 
2, and simulate annual losses in i + j > I. That is, these samples of R are obtained 
from the full predictive distribution / (-R | T>j) = Jg {R \6) it {0 \ T>i) dO, where g {R \6) 
is the distribution of R given by (11. 3p and (12. ip . It takes into account both process 
uncertainty and parameter uncertainty. We note that while reserving by some measure 
of centrality such as R^ may be robust, it will not take into account the distributional 
shape of R. A viable alternative may be Value-at-Risk (VaR) or a coherent risk measure 
such as Expected Shortfall. In Table [7] we demonstrate estimates of the VaR for R and 
R at the 75%, 90% and 95% quantiles. 
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5.2 Results: conditioning on p 

As part of the model uncertainty analysis, it is useful to present plots of the relevant 
quantities in the model selection (selection of p) settings, see Figure O where we present 
ERp = E[R\Vi,p], PVp = E^+j>IE[(p{ai(3,Y \Vj,p] and EEp = VaT{R\Vr,p) as a func- 
tion of p. Figure [6] shows: 

• MLE of ERp is almost constant, varying approximately from a maximum of 603.96 
{p = 1.1) to a minimum of 595.78 {p = 1.9) while the MLE for ER was 602.63. 

• The Bayesian estimates for ERp change as a function of p. Approximately, it ranged 
from a maximum of 646.4 {p = 1.9) to a minimum of 621.1 {p = 1.5) while the 
Bayesian estimator for ER was 624.1. Hence, the difference (estimation bias) within 
this possible model range is ~ 25 which is of a similar order as the process uncer- 
tainty and the estimation error. 

• Bayesian estimators for \/PVp and ^jEEp increase as p increases approximately 
from 33.1 to 68.5 and from 37.4 to 102.0 respectively, while the Bayesian estima- 
tors for \fPV and -/EE are 37.3 and 44.8 correspondingly. Hence, the resulting 
risk measure strongly varies in p which has a large influence on quantitative sol- 
vency requirements. The MLEs for PVp and EEp are significantly less than the 
corresponding Bayesian estimators. Also, the difference between the MLE and the 
Bayesian estimators increases as p increases. 

For interpretation purposes of the above results it is helpful to use the following rela- 
tions between model averaging and model selection quantities (easily derived from their 
definitions in Table [5]): 

ER = E[ERp\Vi\, (5.1) 
PV = E[PVp\Vil (5.2) 
EE = E[EEp\Vi] + Var{ERp\Vj). (5.3) 

Here, the expectations are calculated with respect to the posterior distribution of p. 
The histogram estimate of the later is presented in Figure [7] and highlights significant 
uncertainty in p (model uncertainty within Tweedie's compound Poisson model). 

25 



We also provide Figure [S] demonstrating a Box and Whisker summary of the distributions 
oi R \ p for a range of values of p. This plot provides the first, second and third quartiles as 
the box. The notch represents uncertainty in the median estimate for model comparison, 
across values of p, and the whiskers demonstrate the smallest and largest data points 
not considered as outliers. The outliers are included as crosses and the decision rule to 
determine if a point is an outlier was taken as the default procedure from the statistical 
software package R. 

The conclusion from this section is that if model selection is performed (i.e. p is fixed by the 
modeller), the conditional MSEP will increase significantly if a poor choice of the model 
parameter p is made. In particular, though the median is fairly constant for the entire 
range of p G (1, 2) the shape of the distribution of i? | p is clearly becoming more diffuse 
as p ^ 2. This will lead to significantly larger variance in the reserve estimate. If risk 
measures such as Value-at-Risk are used in place of the mean, it will result in reserves 
which are too conservative (if a poor choice of p is made). Also, using the maximum 
likelihood approach may significantly underestimate the claims reserves and associated 
uncertainties. 

5.3 Over dispersed Poisson and Gamma models 

There are several popular claims reserving models, however we restrict our comparison 
to the overdispersed Poisson and gamma models since they fit into Tweedie's compound 
Poisson framework when p = 1 and p = 2 respectively. Note that the overdispersed 
Poisson model and several other stochastic models lead to the same reserves as the chain 
ladder method but different in higher moments. The detailed treatment of these models 
can be found in e.g. England and Verrall (2002) or Wiithrich and Merz (2008), section 
3.2. 

The MLEs for the reserves and associated uncertainties within the overdispersed Poisson 
and gamma models are provided in Table [HI These results are obtained when the dis- 
persion (j) is estimated by (f)^ using Pearson's residuals (13.181) and when is estimated 
by 0^^^ obtained from the maximization of the likelihood. The results for the first case 
are also presented in Wiithrich and Merz (2008), Table 6.4. Firstly note that, the values 
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of 0^ and 0^LE g^j^g significantly different both for the overdispersed Poisson and gamma 
models. As we mentioned in section 3.2, for a fixed p, the MLE for the reserves does not 
depend on while the estimation error, process variance and MSEP are proportional to 
0. As one can see from Table M, different estimators for the dispersion lead to the same 
estimators for the reserves but very different estimators for the uncertainties. Also note 
that, our MLE calculations for Tweedie's distribution conditional on p, i.e. Figure [6l are 
obtained using 0^^"^ and are consistent with the corresponding results for the overdis- 
persed Poisson and Gamma models when p 1 and p 2 respectively. Though, in the 
case of the overdispersed Poisson we had to use an extended quasi-likelihood to estimate 
0MLE_ Figure [6], we do not show the results based on (f)^ but would like to mention 
that these are always above the MLEs and below the Bayesian estimators for the process 
variance and estimation error and are consistent with corresponding overdispersed Pois- 
son and gamma model limits. Interestingly, the ratio 0-^/0^^^ is approximately 1.4 — 1.5 
for all considered cases of p within a range [1, 2]. 

The MLEs obtained using both 0^le ^^j^j underestimate the uncertainties compared 
to the Bayesian analysis. Note that, while the MLEs for the uncertainties are proportional 
to the dispersion estimator, the corresponding Bayesian estimators are averages over all 
possible values of according to its posterior distribution. The uncertainty in the estimate 
for the dispersion is large which is also highlighted by a bootstrap analysis in Wiithrich 
and Merz (2008), section 7.3. This indicates that should also depend on the individual 
cells However, in this case overparameterization needs to be considered with care 

and Bayesian framework should be preferred. 



6 Discussion 

The results demonstrate the development of a Bayesian model for the claims reserving 
problem when considering Tweedie's compound Poisson model. The sampling methodol- 
ogy of a Gibbs sampler is applied to the problem to study the model sensitivity for a real 
data set. The problem of variable selection is addressed in a manner commensurate with 
the MCMC sampling procedure developed in this paper and the most probable model 
under the posterior marginal model probability is then considered in further analysis. 
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Under this model we then consider two aspects, model selection and model averaging 
with respect to model parameter p. The outcomes from these comparisons demonstrate 
that the model uncertainty due to fixing p plays a significant role in the evaluation of 
the claims reserves and its conditional MSEP. It is clear that whilst the frequentist MLE 
approach is not sensitive to a poor model selection, the Bayesian estimates demonstrate 
more dependence on poor model choice, with respect to model parameter p. We use con- 
stant priors with very wide ranges to perform inference in the setting where the posterior 
is largely implied by data only. Also, we run a large number of MCMC iterations so 
that numerical error in the Bayesian estimators is very small. In the case of the data we 
studied, the MLEs for the claims reserve, process variance and estimation error were all 
significantly different (less) than corresponding Bayesian estimators. This is due to the 
fact that the posterior distribution imphed by the data and estimated using MCMC is 
materially different from Gaussian, i.e. more skewed. 

Future research will examine variable selection aspects of this model in a Bayesian context 
considering the entire set of possible parameterizations. This requires development of 
advanced approaches such as Reversible Jump MCMC and variable selection stochastic 
optimization methodology to determine if a more parsimonious model can be selected 
under assumptions of homogeneity in adjacent columns/rows in the claims triangle. 
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accident 
year i 



development years j 
j 



7-1 
I 



observed random variables Yij e Vj 



to be predicted Yij e Vj 



Table 1: Claims development triangle. 



Year 





1 


2 


3 


4 


5 


6 


7 


8 9 





594.6975 


372.1236 


89.5717 


20.7760 


20.6704 


6.2124 


6.5813 


1.4850 


1.1130 


1.5813 


1 


634.6756 


324.6406 


72.3222 


15.1797 


6.7824 


3.6603 


5.2752 


1.1186 


1.1646 




2 


626.9090 


297.6223 


84.7053 


26.2768 


15.2703 


6.5444 


5.3545 


0.8924 




3 


586.3015 


268.3224 


72.2532 


19.0653 


13.2976 


8.8340 


4.3329 






4 


577.8885 


274.5229 


65.3894 


27.3395 


23.0288 


10.5224 








5 


618.4793 


282.8338 


57.2765 


24.4899 


10.4957 










6 


560.0184 


289.3207 


56.3114 


22.5517 












7 


528.8066 


244.0103 


52.8043 














8 


529.0793 


235.7936 
















9 


567.5568 



















Table 2: Data - annual claims payments Y^j for each accident year i and development 
year j, i + j<9. 
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MLE 


MLE stdev 


Bayesian posterior 


<^RW 


MMSE 


stdev 


[Q0.O5; Qo.Qs] 


p 


1.259 


0.149 


1.332 (0.007) 


0.143 (0.004) 


[1.127;1.590] 


1.61 




0.351 


0.201 


0.533 (0.013) 


0.289 (0.005) 


[0.174;1.119] 


1.94 


ai 


0.918 


0.056 


0.901 (0.004) 


0.074 (0.001) 


[0.778;1.022] 


0.842 


Q!2 


0.946 


0.051 


0.946 (0.003) 


0.073 (0.001) 


[0.833;1.072] 


0.907 


Q:i 


().8()1 


0.048 


().8()1 (o.oo;-!) 


().0()8 (O.OOl) 


0.750:0.977] 


0.849 


Q 1 


0.891 


0.049 


0.902 (O.oo;-!) 


0.072 (().()()2) 


0.794:1. 027] 


0.893 


a 5 


0.879 


0.051 


0.876 (0.003) 


0.070 (0.001) 


[0.768;0.994] 


0.932 


ag 


0.842 


0.048 


0.843 (0.002) 


0.069 (0.001) 


[0.736;0.958] 


0.751 


Q!7 


0.762 


0.046 


0.762 (0.003) 


0.066 (0.001) 


[0.660;0.876] 


0.888 


as 


0.763 


0.047 


0.765 (0.003) 


0.067 (0.001) 


[0.661;0.874] 


0.897 




0.848 


0.059 


0.856 (0.003) 


0.090 (0.002) 


[0.716;1.009] 


1.276 




669.1 


27.7 


672.7 (2.1) 


39.7 (0.7) 


[610.0;740.0] 


296 




329.0 


14.4 


331.1 (1.0) 


20.6 (0.4) 


[298.1;365.9] 


190 




77.43 


4.38 


78.06 (0.24) 


6.10 (0.06) 


[68.58;88.29] 


75.4 




24.59 


1.96 


24.95 (0.08) 


2.64 (0.03) 


[20.89;29.64] 


40.9 




16.28 


1.55 


16.65 (0.05) 


2.09 (0.03) 


[13.44;20.30] 


40.6 




7.773 


1.028 


8.068 (0.024) 


1.356 (0.020) 


[6.064;10.473] 


26.0 


/36 


5.776 


0.937 


6.115 (0.022) 


1.261 (0.016) 


[4.246;8.347] 


24.1 


/37 


1.219 


0.396 


1.494 (0.006) 


0.609 (0.013) 


[0.739;2.609] 


13.1 


/38 


1.188 


0.476 


1.622 (0.008) 


0.802 (0.016) 


[0.674;3.070] 


15.1 


/?9 


1.581 


0.790 


2.439 (0.021) 


1.496 (0.026) 


[0.829;5.250] 


32.1 



Table 3: MLE and Bayesian estimators. (Trw is the proposal standard deviation in the 
MCMC algorithm and [Qo.os; Q0.95] is the predictive interval, where is the quantile of 
the posterior distribution at level a. The numerical standard error, in Bayesian estimators 
due to finite number of MCMC iterations, is included in brackets next to estimates. 
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Mo 


Ml 


M2 


M3 


M4 


M5 


Me 


7r(Mfe 1 Di) 


0.71 


4.19E-54 


3.04E-43 


1.G3E-28 


6.71E-20 


2.17E-21 


0.29 


DIG 


399 


649 


600 


535 


498 


507 


398 


LHR p — value 


1 


2.76E-50 


1.67E-40 


3.53E-28 


5.78E-21 


3.03E-23 


0.043 



Table 4: Posterior model probabilities it {Mk\Dj), Deviance Information Criterion (DIG) 
for variable selection models Mq, . . . , Mq and Likelihood Ratio (LHR) p- values (comparing 
Mo to Ml,..., Me). 





Model Averaging 


Model Selection for p 


Estimated Reserves 
Process Variance 
Estimation Error 


ER = R^^ E[R\Vi] 
PV = E[j:^{a,(3jf\Vj] 
EE = Var(^|X>/) 


ERp ^ E[R\Vi,p] 
PVp^E[Y,c^{a^[iJf\VI,p] 
EEp ^ Yay{R\Vi,p) 



Table 5: Quantities used for analysis of the claims reserving problem under Model Aver- 
aging and Model Selection in respect to p. 





Model Averaging 


Statistic 


Bayesian Estimate 


MLE Estimate 


ER 


624.1 (0.7) 


602.630 


yfPV 


37.3 (0.2) 


25.937 


^/ee 


44.8 (0.5) 


28.336 


Vmsep 


58.3(0.5) 


38.414 



Table 6: Model averaged estimates of claim reserve, process variance and estimation error. 
Numerical error in Bayesian estimates is reported in brackets. See Table |5] for definitions 
of ER, PV, EE and MSEP=EE+PV. 





Model Averaging 


VaRq 


R 


R 


VaR75% 


659.8 (0.9) 


650.6 (1.0) 


VaR9o% 


698.4 (1.2) 


680.4 (1.3) 


VaR95% 


724.0 (1.5) 


701.7 (1.6) 



Table 7: Bayesian model averaged estimates of Value at Risk for outstanding claims 
payment R and claim reserves R. 
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Overdispersed Poisson 


Gamma model 


Statistic 


K I All 




« 0.045 


^MLE ^ g 


ERp 


604.706 


604.706 


594.705 


594.705 




29.829 


24.017 


62.481 


52.162 


\/EEp 


30.956 


24.925 


92.826 


77.496 


^MSEPp 


42.989 


34.613 


111.895 


93.415 



Table 8: The MLEs for the overdispersed Poisson {jp = 1) and Gamma {p = 2) models, 
when the dispersion is estimated as 0^ using Pearson's residuals fl3.18p or 0^^^. 



Intial state of chain 
y Burnin 



Sample Path for p 




2 2.5 3 

Iteration of Markov Chain 



X 10 



Figure 1: Markov chain sample paths (p, 0, ai, /5o). 
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0.2 0.8 1.4 




-0.82 



0.15 



0.7 0.9 1.1 
J I I I L 



-0.36 



-0.02 
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-0.17 



0.23 



-0.06 
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o 

CO 



o 
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o 



o 
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Figure 2: Posterior scatter plots, marginal posterior histograms and linear correlations 
for {p,(j),ai,Po,ai,Pi) . 
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Figure 3: Predicted distribution of reserves, R — ^ OiiPj- 

i+j>I 
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Figure 5: Distribution of total outstanding claims payment R= ^ l^j, accounting for 

i+j>I 

all process, estimation and model uncertainties. 
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660 I — I I \ r 




1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 

Figure 6: Estimates of quantities from Table [5] conditional on p. Note, numerical standard 
errors are not included as they are negligible and are less than the size of the symbols. 



39 



0.12 



T 1 1 1 1 1 r 



0.1 



0.08 



0.06 



0.04 
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Hn-. — 
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Figure 7: Posterior distribution of the model parameter p. 
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Figure 8: Predicted claim reserves R distributional summaries conditional on model pa- 
rameter p. 
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